### 贝叶斯角度谈正则化

#### Ridge Regression

Typically ridge or ℓ2 penalties are much better for minimizing prediction error rather than ℓ1 penalties. The reason for this is that when two predictors are highly correlated, ℓ1 regularizer will simply pick one of the two predictors. In contrast, the ℓ2 regularizer will keep both of them and jointly shrink the corresponding coefficients a little bit. Thus, while the ℓ1 penalty can certainly reduce overfitting, you may also experience a loss in predictive power. 2

#### 总结

ps: 本文写作过程中参考了知乎和网上的很多文章,同时也加入了自己的一些理解，热烈欢迎广大机器学习爱好者一起讨论问题，互通有无！