Bias-Variance Tradeoff

More reading: Bias-Variance Tradeoff (Wikipedia)

What is the bias error?

Bias is the error a model makes in making too simple guesses about the data. For example, if you use a linear regression to capture a non-linear pattern, the model won't learn well, no matter how much data is available. This is called underfitting.

In statistics, an estimator's bias (or bias function) is the difference between the estimator's expected value and the true value of the instance being estimated. An estimator or decision rule with zero bias is called unbiased. So, it measures the difference between the estimated value by a model or measurement method and the real one.

Bias refers to the difference between the true or correct value of some quantity and the measurement or estimate of that quantity. In principle, it cannot be calculated therefore unless that true or correct value is known, although this problem bites to varying degrees.

In the simplest kind of problem, the true value is known (as when the center of a target is visible and the distance of a shot from the center can be measured; this is a common analogy) and bias is then usually calculated as the difference between the true value and the mean (or occasionally some other summary) of measurements or estimates.
In other problems, some careful method is regarded as the state of the art and so yielding the best possible measurements, and so other methods are regarded as more or less biased according to their degree of systematic departure from the best method (in some fields termed a gold standard).
In yet other problems, we have one or more methods all deficient to some degree, and assessment of bias is then difficult or impossible. It is then tempting, or possibly even natural, to change the question and judge truth according to consistency between methods.

What is variance error?

Variance is the error produced when the model adapts too well to training data, capturing even the noise. This makes it too sensitive to small changes in the data. This is called overfitting.

Variance is the variability of model predictions for a given data point or a value that tells us the spread of our data. A model with high variance pays a lot of attention to training data and does not generalize to unseen data. As a result, such models perform very well on training data but have high error rates on test data.

Bias vs Variance

This tradeoff is the property of a set of predictive models whereby models with a lower bias in parameter estimation have a higher variance of the parameter estimates across samples and vice versa. The bias-variance dilemma or problem is the conflict in trying to simultaneously minimize these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set.

A high bias error is due to erroneous or overly simplistic assumptions in the learning algorithm you’re using. This can lead to the model underfitting your data, making it hard for it to have high predictive accuracy and for you to generalize your knowledge from the training set to the test set.

A high variance error is due to too much complexity in the learning algorithm you’re using. This leads to the algorithm being highly sensitive to high degrees of variation in your training data, which can lead your model to overfit the data. You’ll be carrying too much noise from your training data for your model to be very useful for your test data.

Model Type

Bias

Variance

Characteristics

Linear Regresion

High

Low

Very stiff model, doesn't capure non-linear relations. Easy to interpret.

Deep Decisision Tree

Low

High

It adjusts too much to the training set. Overfitting.

Shallow Tree

Moderate

Better balance but less stregth.

Random Forest (many trees)

Low

Reduces variance by averaging trees. Generalizes well.

KNN with k=1

Low

Very high

Totally depending on training data. Very sensitive.

Neural Network (tiny with few layers)

High

Low

Cannot learn too complex relationships

Big non-regularized Neural Network

Low

High

High tendency to overfitting.

Too Simple Models Bias Error

Too Complex ModelsVariance Error

Error decomposition

The bias-variance decomposition essentially decomposes the learning error from any algorithm by adding the bias, the variance, and a bit of irreducible error due to noise in the underlying dataset. Essentially, if you make the model more complex and add more variables, you’ll lose bias but gain some variance — in order to get the optimally reduced amount of error, you’ll have to tradeoff bias and variance. You don’t want either high bias or high variance in your model.

total_error = bias_error² + variance_error + irreducible_error

How can we overcome it?

If our model complexity exceeds this sweet spot, we are in effect over-fitting our model; while if our complexity falls short of the sweet spot, we are under-fitting the model. In practice, there is not an analytical way to find this location. Instead we must use an accurate measure of prediction error and explore differing levels of model complexity and then choose the complexity level that minimizes the overall error. A key to this process is the selection of an accurate error measure as often grossly inaccurate measures are used which can be deceptive.
Scott Fortmann-Roe - Understanding the Bias-Variance tradeoff

Some ways to achieve the Bias-Variance Tradeoff:

Regularizacion techniques (L1 or L2)
A good option would be applying cross-validation.
Using ensemble methods like bagging and Resampling techniques
- One modeling algorithm that makes use of bagging is Random Forests. Here, the bias of the full model is equivalent to the bias of a single decision tree–which itself has high variance. By creating many of these trees, in effect a “forest”, and then averaging them the variance of the final model can be greatly reduced over that of a single tree.
- For further details, check ensemble methods.

Additional marks:

Generally, resampling-based measures such as cross-validation should help. Generally, resampling-based measures such as cross-validation should be preferred over theoretical measures such as Aikake's Information Criteria.
- Akaike’s information criterion (AIC) compares the quality of a set of statistical models to each other. For example, you might be interested in what variables contribute to low socioeconomic status and how each variable contributes to that status. Let’s say you create several regression models for various factors like education, family size, or disability status; the AIC will take each model and rank them from best to worst. The “best” model will be the one that neither under-fits nor over-fits.
Adjusting minor parameters in some estimators:
- Both the k-nearest and Support Vector Machines(SVM) algorithms have low bias and high variance. But the trade-offs in both these cases can be changed:
  - In the K-nearest algorithm, the value of k can be increased, which would simultaneously increase the number of neighbors that contribute to the prediction. This in turn would increase the bias of the model.
  - Whereas, in the SVM algorithm, the trade-off can be changed by an increase in the C parameter that would influence the violations of the margin allowed in the training data. This will increase the bias but decrease the variance.

PreviousInstance-based vs Model-based Learning NextProbability vs Likelihood

Last updated 5 months ago

Was this helpful?