Takes small steps using first derivative information to arrive at root, or minima of a function. Similar to Newton’s Method except it converges slower if using constant step sizes
- The new parameters for our ML model
- is the step size
- is the error function (we want to minimize error)
Training
Gradient Descent
- GradientDescent()
- For epoch
- return
- Time complexity depends on the number of samples and dimensionality
Stochastic Gradient Descent
- StochasticGradientDescent()
- For epoch
- For
-
- The gradient is calculated on just one datapoint
-
- For
- return