Ridge vs Lasso
Last updated
Last updated
Both regressions can be thought of as solving an equation, where the summation of the regularized coefficients is less or equal to s. Where s is a constant that exists for each value of shrinkage factor λ.
Ridge: . This implies that coefficients have the smallest RSS for all points that lie within the circle given by the inequation.
Lasso: . This implies that lasso coefficients have the smallest RSS for all points that lie within the diamond given by the inequation.
The key difference between these techniques is that Lasso shrinks the less important feature’s coefficient to zero thus, removing some features altogether. So, this works well for feature selection in case we have a huge number of features.
This sheds light on the obvious disadvantage of ridge regression, which is model interpretability. Given, it will shrink the coefficients for least important predictors, very close to zero. But it will never make them exactly zero. In other words, the final model will include all predictors. However, in the case of the lasso, the L1 penalty has the effect of forcing some of the coefficient estimates to be exactly equal to zero when the tuning parameter λ is sufficiently large. Therefore, the lasso method also performs variable selection and is said to yield sparse models.
Lasso
Ridge
L1
L2