Friday, April 16, 2010

Bias and Variance

Consider $\langle (y-\theta)^2\rangle $, the expectation of the squared error between estimator $y$, a random variable, and target $\theta$ (determined, not random).

After a small manipulation $ \langle (y-\langle y \rangle +\langle y\rangle-\theta)^2\rangle $, we can write

\[\langle(y-\langle y\rangle)^2+(\langle y\rangle-\theta)^2+2(y-\langle y\rangle)(\langle y\rangle-\theta)\rangle\]

\begin{equation*}
\langle (y-\langle y\rangle)^2\rangle+\langle (\langle y\rangle -\theta)^2\rangle+2 \langle y-\langle y\rangle \rangle \langle \langle y\rangle-\theta\rangle
\end{equation*}


But since $\langle y- \langle y\rangle \rangle$ is $0$, we can write

\begin{equation} \label{bv1}
\langle (y-\langle y\rangle)^2\rangle+ (\langle y\rangle-\theta)^2
\end{equation}

That is, the variance plus the bias squared. An immediate consequence is that if your estimator is unbiased, the mean squared error of your estimator is the same as your estimator's variance. That's a nice result.

Let's try to generalize a little. What if $\theta$ is a random variable? Say, $\theta = \hat{f} + \epsilon$, where $\epsilon$ is a gaussian random variable, center zero and width $\sigma^2$.

Beginning with \eqref{bv1}, we have

\[ \langle (y-\langle y\rangle)^2\rangle+\langle (\langle y\rangle -(\hat{f} + \epsilon))^2\rangle\]

When expanding the second term, all cross terms with $\epsilon$ vanish (center is zero), so we have

\[ \langle (y-\langle y\rangle)^2\rangle+(\langle y\rangle - \hat{f})^2 + \langle\epsilon^2\rangle \]

Or the variance plus the squared bias plus an unavoidable squared error term.