Shafi's ML Blog: Bias-Variance Tradeoff

Bias-Variance Tradeoff

Actual model:

$Y = f(X) + e,~~~~$ where $e$ is the irreducible error.

Estimation:

We approximate $f(X)$ with $~~ \hat{Y}=\hat{f}(X)$

Bias and Variance of the Estimation:

$\text{bias}\left(\hat{f}(X)\right) = \mathbb{E}\left[\hat{f}(X) - f(X)\right] = \mathbb{E}\left[\hat{f}(X)\right] - f(X)$
Here, $~~\mathbb{E}\left[f(X)\right]=f(X)$ , since $f(X)$ is deterministic.

$\text{var}\left(\hat{f}(X)\right) = \mathbb{E}\left[\hat{f}(X)-\mathbb{E}[\hat{f}(X)]\right]^2=\mathbb{E}\left[\hat{f}^2(X)\right]-\mathbb{E}^2\left[\hat{f}(X)\right]$

MSE of estimation as a function of bias and variance of estimation

$\begin{eqnarray} &&\text{MSE}\\ &=&\mathbb{E}\left[Y-\hat{f}(X)\right]^2\\ &=&\mathbb{E}\left[Y^2 -2 Y\hat{f}(X)+\hat{f}^2(X)\right]\\ &=&\mathbb{E}\left[\left(f(X) + e\right)^2 -2 \left(f(X) + e\right)\hat{f}(X)+\hat{f}^2(X)\right]\\ &=&\mathbb{E}\left[f^2(X) +2f(X)e+ e^2 -2 f(X)\hat{f}(X) -2 \hat{f}(X)e+\hat{f}^2(X)\right]\\ &=&f^2(X) +\mathbb{E}\left[e^2\right] -2 f(X)\mathbb{E}\left[\hat{f}(X)\right] +\mathbb{E}\left[\hat{f}^2(X)\right]\\ &=&\left[f^2(X)-2 f(X)\mathbb{E}\left[\hat{f}(X)\right]\right] +\sigma_e^2 +\mathbb{E}\left[\hat{f}^2(X)\right]+\mathbb{E}^2\left[\hat{f}(X)\right]-\mathbb{E}^2\left[\hat{f}(X)\right]\\ &=&\left(f^2(X)-2 f(X)\mathbb{E}\left[\hat{f}(X)\right]+\mathbb{E}^2\left[\hat{f}(X)\right]\right) +\left(\mathbb{E}\left[\hat{f}^2(X)\right]-\mathbb{E}^2\left[\hat{f}(X)\right]\right)+\sigma_e^2 \\ &=&\left(f(X)-\mathbb{E}\left[\hat{f}(X)\right]\right) ^2+\left(\mathbb{E}\left[\hat{f}^2(X)\right]-\mathbb{E}^2\left[\hat{f}(X)\right]\right)+\sigma_e^2\\ &=& \text{bias}^2 + \text{var} +\sigma_e^2 \end{eqnarray}$

Example of bias-variance tradeoff

$\text{bias}=\left(f(X)-\mathbb{E}\left[\hat{f}(X)\right]\right)$
$\text{var}= \mathbb{E}\left[\hat{f}(X)-\mathbb{E}[\hat{f}(X)]\right]^2$

Overfitting is low bias, high variance
Underfitting is high bias, low variance

Red Line: Bias is high. Variance is low.
Blue Line: Bias is low. Variance is high.

Properties: Model Complexity

The more complex the model, the lower the bias.
The more complex the mode, the higher the variance.

Properties: Regularization

The lower the $\lambda$ , the lower the bias, the higher the variance.
The higher the $\lambda$ , the higher the bias, the lower the variance.

Properties: Number of Samples

Increasing sample size will decreasing variance.
If a learning algorithm is suffering from high bias (under-fit), getting more training data will not (by itself) help much.

If a learning algorithm is suffering from high variance, getting more training data is likely to help.

Summary

Getting more training examples fixes high variance (overfit model, use more training example)
Using smaller sets of features fixes high variance (overfit model, use smaller number of features)
Getting additional feature fixes high bias (underfit model, get more features)
Adding polynomial features (i.e. more complex model) fixes high variance (underfit model, increase model complexity)
Increasing $\lambda$ fixes high variance (overfit model, increase $\lambda$ )
Decreasing $\lambda$ fixes high bias (underfit model, decrease $\lambda$ )

So, for overfit model (low bias, high variance):

Increase sample size
Reduce number of features
Increase regularization

For underfit model (high bias, low variance):

Get more features
Increase model complexity
Decrease regularization

Shafi's ML Blog

Saturday, June 16, 2018