Takes small steps using first derivative information to arrive at root, or minima of a function. Similar to Newton’s Method except it converges slower if using constant step sizes

  • The new parameters for our ML model
  • is the step size
  • is the error function (we want to minimize error)

Training

Gradient Descent

  • GradientDescent()
    • For epoch
    • return

  • Time complexity depends on the number of samples and dimensionality

Stochastic Gradient Descent

  • StochasticGradientDescent()
    • For epoch
      • For
          • The gradient is calculated on just one datapoint
    • return