Takes small steps using first derivative information to arrive at root, or minima of a function. Similar to Newton’s Method except it converges slower if using constant step sizes

$θ^{t + 1} \leftarrow θ^{t} - η \frac{\partial f}{\partial θ}$

The new parameters for our ML model
$η$ is the step size
$f$ is the error function (we want to minimize error)

Training

Gradient Descent

GradientDescent()
- $θ := 0$
- For epoch $= 1 \dots T$
  - $θ^{t + 1} \leftarrow θ^{t} - η \frac{\partial f}{\partial θ}$
- return $θ$

$O (n d)$

Time complexity depends on the number of samples and dimensionality

Stochastic Gradient Descent

StochasticGradientDescent()
- $θ := 0$
- For epoch $= 1 \dots T$
  - For $(x, y) \in D$
    - $θ^{t + 1} \leftarrow θ^{t} - η \nabla J_{(x, y)} (θ)$
      - The gradient is calculated on just one datapoint
- return $θ$

Knowledge

Explorer

Gradient Descent

$θ^{t + 1} \leftarrow θ^{t} - η \frac{\partial f}{\partial θ}$

Training

Gradient Descent

$O (n d)$

Stochastic Gradient Descent

Graph View

Table of Contents

Backlinks

Knowledge

Explorer

Gradient Descent

θt+1←θt−η∂θ∂f​

Training

Gradient Descent

O(nd)

Stochastic Gradient Descent

Graph View

Table of Contents

Backlinks

$θ^{t + 1} \leftarrow θ^{t} - η \frac{\partial f}{\partial θ}$

$O (n d)$