Multiple Regression and Interaction Terms
In many real-life situations, there is more than one input variable that controls the output variable.
This post is part of the book Introduction to Algorithms and Machine Learning: from Sorting to Strategic Agents. Suggested citation: Skycak, J. (2022). Multiple Regression and Interaction Terms. In Introduction to Algorithms and Machine Learning: from Sorting to Strategic Agents. https://justinmath.com/multiple-regression-and-interaction-terms/
Want to get notified about new posts? Join the mailing list and follow on X/Twitter.
In many real-life situations, there is more than one factor that controls the quantity we’re trying to predict. That is to say, there is more than one input variable that controls the output variable.
Example: Multiple Input Variables
For example, suppose that a food manufacturing company is testing out different ingredients on sandwiches, including peanut butter and roast beef. They fed sandwiches to subjects and counted the proportion of subjects who liked each sandwich.
We want to build a model that has $3$ input variables:
The model will predict $1$ output variable:
Since this output variable must be between $0$ and $1,$ we will use logistic regression.
The logistic model above is written with only a single input variable. Here, we have $3$ different input variables, so we will introduce a new term for each input variable:
We should also introduce terms that represent interactions between the variables, but to keep things simple and illustrate why such terms are needed, let’s continue without them.
If we fit the above model to our data set by running gradient descent a handful of times with different initial guesses and choosing the best result, we get the following fitted model:
The Need for Interaction Terms
This model makes the following predictions. Some of them seem accurate, but others do not.
The weirdest inaccurate prediction (bolded above) is that the model overrates peanut butter & roast beef sandwiches. It thinks that half of the subjects will like them, when in reality, none of the subjects did. And if you try to imagine that combination of ingredients, it probably doesn’t seem appetizing.
The problem is that our model is not sophisticated enough to capture the idea that two ingredients can taste good alone but bad together (or vice versa). It’s easy to see why this is:
- The logistic function $\dfrac{1}{1 + e^{-(ax+b)}}$ is increasing if $a > 0$ and decreasing if $a < 0.$
- The coefficient on $x_1$ (peanut butter) is $a_1 = 1.02$ and the coefficient on $x_3$ (roast beef) is $a_3 = 1.91.$
- Both of these coefficients are positive. Consequently, the higher $x_1$ (the more scoops of peanut butter), the higher the prediction will be. Likewise, the higher $x_3$ (the more slices of roast beef), the higher the prediction will be.
Interaction Terms
To fix this, we can add interaction terms that multiply two variables together. These terms will vanish unless both variables are nonzero.
The interaction terms above are $a_{12} x_1 x_2,$ $a_{13} x_1 x_3,$ and $a_{23} x_2 x_3.$ The subscripts indicate which variables are being multiplied together.
Notice that, for example, the interaction term $a_{13}x_1 x_3$ will not have an effect on the predictions for $x_1$ (peanut butter) or $x_3$ (roast beef) in isolation, but it will have an effect when these ingredients are combined.
If we fit this model again using gradient descent, we get the following result:
Now, the model makes much more accurate predictions.
As a sanity check, we can also interpret the coefficients of the interaction terms:
- The interaction term between $x_1$ (peanut butter) and $x_2$ (jelly) is $3.82 \, x_1 x_2.$ The positive coefficient indicates that combining peanut butter and jelly should increase the prediction.
- The interaction term between $x_1$ (peanut butter) and $x_3$ (roast beef) is $-4.82 \, x_1 x_3.$ The negative coefficient indicates that combining peanut butter and roast beef should decrease the prediction.
- The interaction term between $x_2$ (jelly) and $x_3$ (roast beef) is $-3.34 \, x_2 x_3.$ The negative coefficient indicates that combining jelly and roast beef should decrease the prediction.
Intuitively, this all makes sense. Peanut butter & jelly go together, but peanut butter & roast beef do not go together, and nor do jelly & roast beef.
Exercise
Implement the example that was worked out above.
This post is part of the book Introduction to Algorithms and Machine Learning: from Sorting to Strategic Agents. Suggested citation: Skycak, J. (2022). Multiple Regression and Interaction Terms. In Introduction to Algorithms and Machine Learning: from Sorting to Strategic Agents. https://justinmath.com/multiple-regression-and-interaction-terms/
Want to get notified about new posts? Join the mailing list and follow on X/Twitter.