Q&A: Why can Any Set of Data Points with Different Inputs be Fit Perfectly by a Polynomial of Sufficient Degree?

by Justin Skycak on

Cross-posted from here.

In general, if you have a $n$th-degree polynomial $p(x) = a_0 + a_1 x + \ldots + a_n x^n,$ and you’re fitting it to the set of $m$ data points ${ (x_1,y_1), \ldots, (x_m,y_m) },$ then you’re trying to solve the following system of linear equations:

$$\begin{align*} y_1 &= p(x_1) = a_0 + a_1 x_1 + \ldots + a_n x_1^n \\ &\vdots \\ y_m &= p(x_m) = a_0 + a_1 x_m + \ldots + a_n x_m^n \\ \end{align*}$$


Let’s write this using vectors:

$$\begin{align*} \begin{bmatrix} y_1 \\ \vdots \\ y_m \end{bmatrix} = \begin{bmatrix} a_0 \\ \vdots \\ a_0 \end{bmatrix} + \begin{bmatrix} a_1x_1 \\ \vdots \\ a_1x_m \end{bmatrix} + \cdots + \begin{bmatrix} a_nx_1^n \\ \vdots \\ a_nx_m^n \end{bmatrix} \end{align*}$$


And simplify a bit:

$$\begin{align*} \begin{bmatrix} y_1 \\ \vdots \\ y_m \end{bmatrix} = a_0 \begin{bmatrix} 1 \\ \vdots \\ 1 \end{bmatrix} + a_1 \begin{bmatrix} x_1 \\ \vdots \\ x_m \end{bmatrix} + \cdots + a_n \begin{bmatrix} x_1^n \\ \vdots \\ x_m^n \end{bmatrix} \end{align*}$$


In other words, a solution exists if the $y$-vector is in the span of the $x$-vectors:

$$\begin{align*} \begin{bmatrix} y_1 \\ \vdots \\ y_m \end{bmatrix} \in \textrm{span} \left \{ \begin{bmatrix} 1 \\ \vdots \\ 1 \end{bmatrix}, \begin{bmatrix} x_1 \\ \vdots \\ x_m \end{bmatrix}, \cdots, \begin{bmatrix} x_1^n \\ \vdots \\ x_m^n \end{bmatrix} \right \} \end{align*}$$


Since the $y$-values are arbitrary, this is equivalent to saying that the $x$-vectors span all possible $y$-vectors in $\mathbb{R}^m{:}$

$$\begin{align*} \textrm{span} \left \{ \begin{bmatrix} 1 \\ \vdots \\ 1 \end{bmatrix}, \begin{bmatrix} x_1 \\ \vdots \\ x_m \end{bmatrix}, \cdots, \begin{bmatrix} x_1^n \\ \vdots \\ x_m^n \end{bmatrix} \right \} = \mathbb{R}^m \end{align*}$$


You can span $\mathbb{R}^m$ with and only with a set of $m$ independent vectors in $\mathbb{R}^m.$ So, let’s count off $m$ vectors:

  • The $1$st vector is $\begin{bmatrix} 1 \\ \vdots \\ 1 \end{bmatrix}$ which you can think of as $\begin{bmatrix} x_1^0 \\ \vdots \\ x_m^0 \end{bmatrix},$
  • the components of the $2$nd vector $\begin{bmatrix} x_1 \\ \vdots \\ x_m \end{bmatrix},$ i.e. $\begin{bmatrix} x_1^1 \\ \vdots \\ x_m^1 \end{bmatrix},$ have exponents $1,$
  • the components of the $3$rd vector (not shown above, but it would be $\begin{bmatrix} x_1^2 \\ \vdots \\ x_m^2 \end{bmatrix}$) have exponents $2,$
  • ...
  • the components of the $m$th vector have exponents $m-1.$


So, you’re guaranteed a perfect fit if $n$ is at least $m-1.$

In other words, any set of $m$ data points with different inputs can be fit perfectly by a polynomial whose degree is at least $m-1.$