A model that predicts the probability of a data point belonging to a certain class. This probability may then be used to classify (e.g. ) if desired
Topics
- Negative log of the product of log likelihood…
Training
Use gradient descent to minimize the negative log likelihood of the dataset (same as finding the Maximum Likelihood Estimate) by modifying , or the weights of the model
- The log likelihood of the dataset
- is the same as
- is the error of the th training sample
Testing
- is the sigmoid function
- is the activation function
- If
- Classify
- Else
- Classify
- is the dimensionality of the data