Knowledge

❯

❯

Softmax

Feb 09, 20251 min read

Computers

$P (Y = i ∣ X) = \frac{e ^{β_{0, i} + β_{1, i} X}}{\sum _{j = 1}^{K} e ^{β_{0, j} + β_{1, j} X}}$

Probability that our prediction $Y$ is of a class $i$ given some data $X$ when there are $K$ classes
Has the effect of setting one term close to 1 and the others close to 0
The denominator is a normalization term to make this a valid probability

Graph View

Backlinks

Logistic Regression

Created with Quartz v4.5.2 © 2026

Personal Site
GitHub