A method to tune the parameters a neural network. Often done by gradient descent, where the change in the loss function is calculated chain rule using computation graphs
- The derivative of the loss function with respect to the last layer’s parameters
A method to tune the parameters a neural network. Often done by gradient descent, where the change in the loss function is calculated chain rule using computation graphs