Ridge (Statistics)

#Math
Regularization method that tends to shrink all coefficients to non-zero values

$\displaystyle {\lambda}\sum_{j = 1}^{J}\beta_{j} ^{2}=\lambda \lVert \beta\rVert_{2}$

  • This ridge term effectively minimizes the coefficients for any feature $\displaystyle \beta$
  • $\displaystyle J$ is the number of parameters
  • Effectively uses the L2 norm on each parameter
  • This term is added to the loss function