Demonstration of Setting Encompassing Weights

by Justin Skycak (@justinskycak) on November 20, 2024

Link to Podcast

Encompassing weights control how much spaced repetition credit is propagated backwards from a more advanced topic to a simpler prerequisite topic when a student does a spaced repetition on the more advanced topic. Setting them is tedious, and it sucks, but it's completely necessary. That's sometimes what you’ve got to do when you want to build a solution that actually solves a problem. You have to put in the hard work.

Want to get notified about new posts? Join the mailing list and follow on X/Twitter.

Link to Podcast

Summary

In preparation for the imminent release of our Probability & Statistics course, I have a bunch of encompassing weights to set today, so I figured I’d take this as an opportunity to make a little video demonstrating how it’s done. (tip: 1.5x speed)

(The encompassing weights control how much spaced repetition credit is propagated backwards from a more advanced topic to a simpler prerequisite topic when a student does a spaced repetition on the more advanced topic.)

If this seems tedious… yeah, it is. I’ve set over 10,000 of these weights, it’s sucked as much as it sounds, it’s not a particularly fast process, it does take quite a bit of work, it’s is cognitively taxing.

But hey, that’s sometimes what you’ve got to do when you want to build a solution that actually solves a problem. You have to put in the hard work.

Of course you want to try to work smarter when possible, but sometimes it comes down to you approaching the problem as cleverly as possible and pushing the boulder across the finish line requires human effort.

So you suck it up, put in that human effort, give it to your model, and now your model knows what’s going on, and now your model can make good decisions.

Transcript

The transcript below is provided with the caveat that there may be occasional typos and light rephrasings. Typos can be introduced by process of converting audio to a raw word-for-word transcript, and light rephrasings can be introduced by the process of smoothing out natural speech patterns to be more readable via text.

∗ ∗ ∗

I thought it would be interesting to show a bit of the process for setting data in the knowledge graph, particularly encompassing weights. I talked about this sometimes—setting prerequisites, setting encompassing weights.

Alex sets prerequisites; I set encompassing weights that determine how much review credit should be propagated back to subskills in our knowledge graph. When a student completes an advanced skill, how much credit do they get for lower skills in the graph? Based repetition credit.

I figured it would be interesting to go through an example of actually figuring out and setting this data. We have a lot of material in our probability and statistics course that we’re refining, trying to get that course done in the very near future. It’s coming soon. You can ask Alex about that for real-time updates.

I have a bunch of topics that are pretty much done, ready to go out, and I need to set this metadata for them, mainly the encompassing weights. I collect the topic IDs from here. This one says topic ID 3273—you can see it right up here. I put those into my command-line tool, and it spits out some information for me.

These are the weights I need to set. Right now, the output is a little messy because I also have some other weights to set, and it’s just telling me I need to set a bunch more of those. I collect this information into a CSV for all these topics and put that CSV into Google Sheets to make it easier to manage.

Each of these rows is a topic and a prerequisite topic. It might be a direct prerequisite, or it might be a key prerequisite, meaning it’s several steps back in the knowledge graph but particularly relevant to a certain problem we’re trying to have the student solve.

I need to determine the weight for each relationship. If a student solves a problem on this topic—mean and variance of the negative binomial distribution—how much implicit review credit do they get for solving a problem in this topic—expected values of discrete random variables?

We have to tell the spaced repetition model: when a student solves this problem, do they get a full problem’s worth of spaced repetition credit on this topic as well? Do they not get any credit at all? Or is it somewhere in between—0.5 credit, 0.2? What is it?

These weights go in this column and are all between 0 and 1. A weight of 0 means there’s no review credit. The advanced topic isn’t fully leveraging the subskill in a way that counts as practice—it’s just contextual knowledge. A weight of 1 means the subskill is being fully leveraged, almost like direct practice. There’s a whole fuzzy spectrum in between.

Let’s jump into it. I’ve got mean and variance of the negative binomial distribution. I have our lesson up there, and I’ve also got the prerequisite, expected values of discrete random variables.

The question I’m considering is, if a student solves a problem from this topic, where the problems are similar to the worked examples, how much credit do we give them for solving a problem on expected values of discrete random variables? Looking through here, one thing I notice is that these problems don’t actually compute the expected value from first principles. This expected value is unique to this particular distribution, presented in the topic. It’s a formula for the expected value in a particular case.

It’s useful to know, and it’s good to practice, but it’s not leveraging the subskill of computing an expected value from first principles—taking an arbitrary probability distribution and computing the expected value. You definitely want to know this topic beforehand. You need to know how to compute an expected value and understand what it means, but this topic exercises a special case of a particular distribution.

That means the encompassing weight is zero. It’s a prerequisite, but this topic isn’t actually getting practice when solving problems here. Another way to think about it is that if you only solved problems from this topic forever, would you forget how to compute expected values from first principles? Yes, probably, because this topic focuses on applying the formula, whereas the prerequisite topic focuses on computing expected values in a general case.

A student solving problems here shouldn’t get credit for the prerequisite. They should have to solve problems on the prerequisite topic directly. They can’t get implicit credit because they’re not actually using the skill comprehensively enough.

I’m guessing it’s a similar case for variance. We have another prerequisite topic: variance of discrete random variables. It’s probably the same situation.

In the variance topic, we’re computing it from first principles—given an arbitrary distribution. In the post-work topic, we’re just applying a specific formula for variance in a special case of this particular distribution. That also means a weight of zero. You need to know the prerequisite, but you’re not actually getting spaced repetition credit for using it as a subskill.

Let’s see. Mean and variance of the geometric distribution—another prerequisite we need to set.

This topic is pretty similar in spirit to the negative binomial distribution, just with a different distribution. The question is, if you’re solving problems from this topic, are you getting any credit on this topic? They’re different distributions, so I don’t think you’re actually getting any credit.

The geometric distribution is related to the negative binomial distribution. In particular, you’re adding up a bunch of geometric random variables and measuring the outcome of their sum. There’s a close relationship.

In the case of adding up just one geometric random variable, the negative binomial distribution with r = 1 simplifies exactly to the geometric distribution. The geometric distribution is a special case of the negative binomial distribution.

How much credit? It’s similar enough that by practicing the skill, you’re practicing a more general skill. It’s a subskill, but at the same time, there’s also the recognition factor. In the problem statements, we’re always calling this a negative binomial distribution, whereas in these problems, it’s the geometric distribution.

Even though this is a special case, and in the case of r = 1, it’s the same exact work—except here you’re also plugging in a different value for r—the priming still needs to be exercised. This is a case of a fuzzy encompassing relationship. The solution skill is similar, but the priming is not exactly the same.

If a student only works on problems from this topic, are they going to forget the prerequisite topic? Probably. However, the problems are similar enough that some of this practice should count toward the prerequisite. They would remember how to work out the expected value for the negative binomial distribution, but they might forget the relationship to the geometric distribution.

They might need a reminder: “Hey, remember, the geometric distribution is just the case of the negative binomial distribution with r = 1.” From there, it would be easy to solve. This is a fuzzy case. I’m setting this weight at 0.5, giving substantial credit since the solution is mostly encompassed, but the problem statement and priming are not.

All right, let’s move on. Same topic—variance of sums of independent random variables.

This topic shows how, if two random variables are independent, you can distribute the variance across them. It has different properties when dealing with linear combinations with scalar multiples, using properties of variance to simplify the expression.

The question is, are we using that subskill here? Since this is variance, we need to look at the part of the lesson where variance is involved. I don’t think we are. We’re just dealing with a single random variable, and it doesn’t have a scalar in front of it.

Checking some of the problem types to confirm—yes, there’s some standard deviation work, but nothing involving distributing variance. I would call this a zero.

Next prerequisite: the negative binomial distribution.

For variance of sums of independent random variables, I’m not sure if it’s actually a prerequisite or if there was a previously stored weight that needs recalibrating. Let’s check if there’s a path from this topic up the knowledge graph to that topic.

Yes, it is a prerequisite. It’s probably needed for some context in an explanation during the lesson. I see why—it’s required because the student needs to understand that we can distribute variance over the sum since the random variables are independent. That’s covered in topic 306, which explains how, if variables are independent, variance distributes this way. It makes sense that it’s a prerequisite.

Moving on.

Negative binomial distribution. We need to determine if solving problems from this topic gives credit for solving the negative binomial distribution.

This topic introduces the negative binomial distribution and has students compute probabilities. Computing a probability leverages the probability distribution function, which is different from this topic, which focuses on the formulas for expectation and variance.

No, this won’t get credit for that.

Credit for modeling? Probably not. We’re not really doing any modeling here. No.

Modeling topics ensure the student is comfortable with the distribution and can use the probability density function before tackling more advanced statistics based on that probability density function—or probability mass function, since it’s a discrete random variable.

Introduction to functions sounds like it might have a high encompassing value. Ultimately, the student is given a function representing the expected value and needs to plug values into it, similar to evaluating a function.

There aren’t really any exponents involved in this topic, but at such a high level—wait, there is a squared term in the denominator. Technically, that means students get some exponent practice, but overall, it doesn’t matter much because at this level of probability and statistics, basic exponent knowledge should already be solid. That makes this a full-encompassing relationship.

Modeling with the negative binomial distribution—I thought we did that. For some reason, I have it duplicated. Oh, I see. That’s another topic I need to set values for, and it has the same prerequisite and post-requisite relationship. I need to set data for both, so it shows up twice.

Does modeling with the negative binomial distribution encompass modeling with the geometric distribution? The geometric distribution is a special case of the negative binomial distribution.

First thoughts—this doesn’t have the same priming problem as before because these are word problems. If you understand how to apply the negative binomial distribution in modeling contexts, you can apply the same knowledge here since it’s just the more general version of the geometric distribution.

You’re just dealing with the special case of r = 1—one geometric variable in the sum. This looks like a full-encompassing relationship, but it’s important to verify before setting it. You don’t want to give students credit for something they aren’t actually practicing, or they’ll forget it.

Let’s check. One question is, which situations can be modeled with that distribution? Let’s compare similar cases.

This is, on a particular spin, computing the success probability.

We have to identify the success probability and determine how many successes we need. This is the case of when the first success or first failure occurs. In this problem statement, the p-value represents the failure chance because we’re wondering when the student will first fail or when the archer will first miss the shot.

If you can solve this kind of problem, you’ll be able to solve similar problems. You’re implicitly practicing the same skill. That makes this a full-encompassing relationship.

I’m going to be doing a lot of this for the next hour or so. I have quite a few to set—probably several hours’ worth. Setting weights in the knowledge graph requires a lot of domain expertise and effort. There are tens of thousands of these weights.

I’m going a little slower right now because I have to talk through and explain what’s going on. Normally, I go through these faster when I don’t have to verbalize the process. Still, it’s not a fast task, and it takes considerable work.

But that’s what you have to do when you want to build a solution that actually solves a problem. You put in the hard work, try to work smarter when possible, but sometimes it comes down to approaching the problem as cleverly as possible. In the end, it still requires human effort. You put in that effort, feed it to your model, and now your model understands what’s going on.

Then your model can make good decisions. That’s how it’s done.

Jason messaged me earlier, so I should probably cut this off and respond to him.

Prompt

The following prompt was used to generate this transcript.

You are a grammar cleaner. All you do is clean grammar, remove single filler words such as “yeah” and “like” and “so”, remove any phrases that are repeated consecutively verbatim, and make short paragraphs separated by empty lines. Do not change any word choice, or leave any information out. Do not summarize or change phrasing. Please clean the attached text. It should be almost exactly verbatim. Keep all the original phrasing. Do not censor.

I manually ran this on each segment of a couple thousand characters of text from the original transcript.

Want to get notified about new posts? Join the mailing list and follow on X/Twitter.