Bias-Variance Tradeoff
Last updated
Was this helpful?
Last updated
Was this helpful?
Bias is the error a model makes in making too simple guesses about the data. For example, if you use a linear regression to capture a non-linear pattern, the model won't learn well, no matter how much data is available. This is called underfitting.
In statistics, an estimator's bias (or bias function) is the difference between the estimator's expected value and the true value of the instance being estimated. An estimator or decision rule with zero bias is called unbiased. So, it measures the difference between the estimated value by a model or measurement method and the real one.
Bias refers to the difference between the true or correct value of some quantity and the measurement or estimate of that quantity. In principle, it cannot be calculated therefore unless that true or correct value is known, although this problem bites to varying degrees.
In the simplest kind of problem, the true value is known (as when the center of a target is visible and the distance of a shot from the center can be measured; this is a common analogy) and bias is then usually calculated as the difference between the true value and the mean (or occasionally some other summary) of measurements or estimates.
In other problems, some careful method is regarded as the state of the art and so yielding the best possible measurements, and so other methods are regarded as more or less biased according to their degree of systematic departure from the best method (in some fields termed a gold standard).
In yet other problems, we have one or more methods all deficient to some degree, and assessment of bias is then difficult or impossible. It is then tempting, or possibly even natural, to change the question and judge truth according to consistency between methods.
Variance is the error produced when the model adapts too well to training data, capturing even the noise. This makes it too sensitive to small changes in the data. This is called overfitting.
Variance is the variability of model predictions for a given data point or a value that tells us the spread of our data. A model with high variance pays a lot of attention to training data and does not generalize to unseen data. As a result, such models perform very well on training data but have high error rates on test data.
A high bias error is due to erroneous or overly simplistic assumptions in the learning algorithm you’re using. This can lead to the model underfitting your data, making it hard for it to have high predictive accuracy and for you to generalize your knowledge from the training set to the test set.
A high variance error is due to too much complexity in the learning algorithm you’re using. This leads to the algorithm being highly sensitive to high degrees of variation in your training data, which can lead your model to overfit the data. You’ll be carrying too much noise from your training data for your model to be very useful for your test data.
Model Type
Bias
Variance
Characteristics
Linear Regresion
High
Low
Very stiff model, doesn't capure non-linear relations. Easy to interpret.
Deep Decisision Tree
Low
High
It adjusts too much to the training set. Overfitting.
Shallow Tree
Moderate
Moderate
Better balance but less stregth.
Random Forest (many trees)
Low
Low
Reduces variance by averaging trees. Generalizes well.
KNN with k=1
Low
Very high
Totally depending on training data. Very sensitive.
Neural Network (tiny with few layers)
High
Low
Cannot learn too complex relationships
Big non-regularized Neural Network
Low
High
High tendency to overfitting.
The bias-variance decomposition essentially decomposes the learning error from any algorithm by adding the bias, the variance, and a bit of irreducible error due to noise in the underlying dataset. Essentially, if you make the model more complex and add more variables, you’ll lose bias but gain some variance — in order to get the optimally reduced amount of error, you’ll have to tradeoff bias and variance. You don’t want either high bias or high variance in your model.
If our model complexity exceeds this sweet spot, we are in effect over-fitting our model; while if our complexity falls short of the sweet spot, we are under-fitting the model. In practice, there is not an analytical way to find this location. Instead we must use an accurate measure of prediction error and explore differing levels of model complexity and then choose the complexity level that minimizes the overall error. A key to this process is the selection of an accurate error measure as often grossly inaccurate measures are used which can be deceptive.
Some ways to achieve the Bias-Variance Tradeoff:
One modeling algorithm that makes use of bagging is Random Forests. Here, the bias of the full model is equivalent to the bias of a single decision tree–which itself has high variance. By creating many of these trees, in effect a “forest”, and then averaging them the variance of the final model can be greatly reduced over that of a single tree.
This tradeoff is the property of a set of predictive models whereby models with a lower in have a higher of the parameter estimates across and vice versa. The bias-variance dilemma or problem is the conflict in trying to simultaneously minimize these two sources of that prevent algorithms from generalizing beyond their .
Too Simple Models Bias Error
Too Complex ModelsVariance Error
Scott Fortmann-Roe -
techniques (L1 or L2)
A good option would be applying .
Using like bagging and Resampling techniques
For further details, check .
Akaike’s information criterion () compares the quality of a set of statistical models to each other. For example, you might be interested in what variables contribute to low socioeconomic status and how each variable contributes to that status. Let’s say you create several models for various factors like education, family size, or disability status; the AIC will take each model and rank them from best to worst. The “best” model will be the one that neither under-fits nor over-fits.