The iron ML notebook
  • The iron data science notebook
  • ML & Data Science
    • Frequent Questions
      • Discriminative vs Generative models
      • Supervised vs Unsupervised learning
      • Batch vs Online Learning
      • Instance-based vs Model-based Learning
      • Bias-Variance Tradeoff
      • Probability vs Likelihood
      • Covariance vs Correlation Matrix
      • Precision vs Recall
      • How does a ROC curve work?
      • Ridge vs Lasso
      • Anomaly detection methods
      • How to deal with imbalanced datasets?
      • What is "Statistically Significant"?
      • Recommendation systems methods
    • Statistics
      • The basics
      • Distributions
      • Sampling
      • IQR
      • Z-score
      • F-statistic
      • Outliers
      • The bayesian basis
      • Statistic vs Parameter
      • Markov Monte Carlo Chain
    • ML Techniques
      • Pre-process
        • PCA
      • Loss functions
      • Regularization
      • Optimization
      • Metrics
        • Distance measures
      • Activation Functions
      • Selection functions
      • Feature Normalization
      • Cross-validation
      • Hyperparameter tuning
      • Ensemble methods
      • Hard negative mining
      • ML Serving
        • Quantization
        • Kernel Auto-Tuning
        • NVIDIA TensorRT vs ONNX Runtime
    • Machine Learning Algorithms
      • Supervised Learning
        • Support Vector Machines
        • Adaptative boosting
        • Gradient boosting
        • Regression algorithms
          • Linear Regression
          • Lasso regression
          • Multi Layer Perceptron
        • Classification algorithms
          • Perceptron
          • Logistic Regression
          • Multilayer Perceptron
          • kNN
          • Naive Bayes
          • Decision Trees
          • Random Forest
          • Gradient Boosted Trees
      • Unsupervised learning
        • Clustering
          • Clustering metrics
          • kMeans
          • Gaussian Mixture Model
          • Hierarchical clustering
          • DBSCAN
      • Cameras
        • Intrinsic and extrinsic parameters
    • Computer Vision
      • Object Detection
        • Two-Stage detectors
          • Traditional Detection Models
          • R-CNN
          • Fast R-CNN
          • Faster R-CNN
        • One-Stage detectors
          • YOLO
          • YOLO v2
          • YOLO v3
          • YOLOX
        • Techniques
          • NMS
          • ROI Pooling
        • Metrics
          • Objectness Score
          • Coco Metrics
          • IoU
      • MOT
        • SORT
        • Deep SORT
  • Related Topics
    • Intro
    • Python
      • Global Interpreter Lock (GIL)
      • Mutability
      • AsyncIO
    • SQL
    • Combinatorics
    • Data Engineering Questions
    • Distributed computation
      • About threads & processes
      • REST vs gRPC
  • Algorithms & data structures
    • Array
      • Online Stock Span
      • Two Sum
      • Best time to by and sell stock
      • Rank word combination
      • Largest subarray with zero sum
    • Binary
      • Sum of Two Integers
    • Tree
      • Maximum Depth of Binary Tree
      • Same Tree
      • Invert/Flip Binary Tree
      • Binary Tree Paths
      • Binary Tree Maximum Path Sum
    • Matrix
      • Set Matrix Zeroes
    • Linked List
      • Reverse Linked List
      • Detect Cycle
      • Merge Two Sorted Lists
      • Merge k Sorted Lists
    • String
      • Longest Substring Without Repeating Characters
      • Longest Repeating Character Replacement
      • Minimum Window Substring
    • Interval
    • Graph
    • Heap
    • Dynamic Programming
      • Fibonacci
      • Grid Traveler
      • Can Sum
      • How Sum
      • Best Sum
      • Can Construct
      • Count Construct
      • All Construct
      • Climbing Stairs
Powered by GitBook
On this page
  • What is the bias error?
  • What is variance error?
  • Bias vs Variance
  • Error decomposition
  • How can we overcome it?

Was this helpful?

  1. ML & Data Science
  2. Frequent Questions

Bias-Variance Tradeoff

PreviousInstance-based vs Model-based LearningNextProbability vs Likelihood

Last updated 3 years ago

Was this helpful?

More reading:

What is the bias error?

In statistics, an estimator's bias (or bias function) is the difference between the estimator's expected value and the true value of the instance being estimated. An estimator or decision rule with zero bias is called unbiased. So. it measures the difference between the estimated value by a model or measurement method and the real one.

Bias refers to the difference between the true or correct value of some quantity and the measurement or estimate of that quantity. In principle, it cannot be calculated therefore unless that true or correct value is known, although this problem bites to varying degrees.

  • In the simplest kind of problem, the true value is known (as when the center of a target is visible and the distance of a shot from the center can be measured; this is a common analogy) and bias is then usually calculated as the difference between the true value and the mean (or occasionally some other summary) of measurements or estimates.

  • In other problems, some careful method is regarded as the state of the art and so yielding the best possible measurements, and so other methods are regarded as more or less biased according to their degree of systematic departure from the best method (in some fields termed a gold standard).

  • In yet other problems, we have one or more methods all deficient to some degree, and assessment of bias is then difficult or impossible. It is then tempting, or possibly even natural, to change the question and judge truth according to consistency between methods.

What is variance error?

Variance is the variability of model prediction for a given data point or a value that tells us the spread of our data. A model with high variance pays a lot of attention to training data and does not generalize on unseen data. As a result, such models perform very well on training data but have high error rates on test data.

Bias vs Variance

A high bias error is due to erroneous or overly simplistic assumptions in the learning algorithm you’re using. This can lead to the model underfitting your data, making it hard for it to have high predictive accuracy and for you to generalize your knowledge from the training set to the test set.

A high variance error is due to too much complexity in the learning algorithm you’re using. This leads to the algorithm being highly sensitive to high degrees of variation in your training data, which can lead your model to overfit the data. You’ll be carrying too much noise from your training data for your model to be very useful for your test data.

Error decomposition

The bias-variance decomposition essentially decomposes the learning error from any algorithm by adding the bias, the variance, and a bit of irreducible error due to noise in the underlying dataset. Essentially, if you make the model more complex and add more variables, you’ll lose bias but gain some variance — in order to get the optimally reduced amount of error, you’ll have to tradeoff bias and variance. You don’t want either high bias or high variance in your model.

total_error = bias_error² + variance_error + irreducible_error

How can we overcome it?

If our model complexity exceeds this sweet spot, we are in effect over-fitting our model; while if our complexity falls short of the sweet spot, we are under-fitting the model. In practice, there is not an analytical way to find this location. Instead we must use an accurate measure of prediction error and explore differing levels of model complexity and then choose the complexity level that minimizes the overall error. A key to this process is the selection of an accurate error measure as often grossly inaccurate measures are used which can be deceptive.

Some ways to achieve the Bias-Variance Tradeoff:

    • One modeling algorithm that makes use of bagging is Random Forests. Here, the bias of the full model is equivalent to the bias of a single decision tree–which itself has high variance. By creating many of these trees, in effect a “forest”, and then averaging them the variance of the final model can be greatly reduced over that of a single tree.

Additional marks:

  • Generally, resampling-based measures such as cross-validation should help. Generally, resampling-based measures such as cross-validation should be preferred over theoretical measures such as Aikake's Information Criteria.

  • Adjusting minor parameters in some estimators:

    • Both the k-nearest and Support Vector Machines(SVM) algorithms have low bias and high variance. But the trade-offs in both these cases can be changed:

      • In the K-nearest algorithm, the value of k can be increased, which would simultaneously increase the number of neighbors that contribute to the prediction. This in turn would increase the bias of the model.

      • Whereas, in the SVM algorithm, the trade-off can be changed by an increase in the C parameter that would influence the violations of the margin allowed in the training data. This will increase the bias but decrease the variance.

This tradeoff is the property of a set of predictive models whereby models with a lower in have a higher of the parameter estimates across and vice versa. The bias-variance dilemma or problem is the conflict in trying to simultaneously minimize these two sources of that prevent algorithms from generalizing beyond their .

Too Simple Models Bias Error

Too Complex ModelsVariance Error

Scott Fortmann-Roe -

techniques (L1 or L2)

A good option would be applying .

Using like bagging and Resampling techniques

For further details, check .

Akaike’s information criterion () compares the quality of a set of statistical models to each other. For example, you might be interested in what variables contribute to low socioeconomic status and how each variable contributes to that status. Let’s say you create several models for various factors like education, family size, or disability status; the AIC will take each model and rank them from best to worst. The “best” model will be the one that neither under-fits nor over-fits.

bias
parameter
estimation
variance
samples
error
supervised learning
training set
Understanding the Bias-Variance tradeoff
Regularizacion
cross-validation
ensemble methods
ensemble methods
AIC
regression
Bias-Variance Tradeoff (Wikipedia)
U+21E8.gif
U+21E8.gif
The bias is expressed as the systematic error
Relation with overfit and underfit