Ridge vs Lasso

Credit:

Overview

Lasso

Ridge

$loss+ \lambda \sum_{j=1}^{p} | \beta_j|$

$loss + \lambda \sum_{j=1}^{p} \beta_j^2$

About the equations

Both regressions can be thought of as solving an equation, where the summation of the regularized coefficients is less or equal to s. Where s is a constant that exists for each value of shrinkage factor λ.

Ridge: $\beta_1^2 + \beta_2^2 \le s$ . This implies that coefficients have the smallest RSS for all points that lie within the circle given by the inequation.
Lasso: $|\beta_1| + |\beta_2| \le s$ . This implies that lasso coefficients have the smallest RSS for all points that lie within the diamond given by the inequation.

Conclusions

The key difference between these techniques is that Lasso shrinks the less important feature’s coefficient to zero thus, removing some features altogether. So, this works well for feature selection in case we have a huge number of features.

This sheds light on the obvious disadvantage of ridge regression, which is model interpretability. Given, it will shrink the coefficients for least important predictors, very close to zero. But it will never make them exactly zero. In other words, the final model will include all predictors. However, in the case of the lasso, the L1 penalty has the eﬀect of forcing some of the coeﬃcient estimates to be exactly equal to zero when the tuning parameter λ is suﬃciently large. Therefore, the lasso method also performs variable selection and is said to yield sparse models.

PreviousHow does a ROC curve work?NextAnomaly detection methods

Last updated 1 month ago

Was this helpful?