P(Y=i∣X)=∑j=1Keβ0,j+β1,jXeβ0,i+β1,iX Probability that our prediction Y is of a class i given some data X when there are K classes Has the effect of setting one term close to 1 and the others close to 0 The denominator is a normalization term to make this a valid probability