Feed Forward

$a_{0}$ is a bias term

$a^{(i + 1)} = σ (Θ^{(i)} a^{(i)})$

* Where $\displaystyle n_{i}$ is the number of nodes in the $\displaystyle i$th layer * $\displaystyle \sigma(\cdot )$ is the [[sigmoid function]] and is applied to each entry of the vector * $\displaystyle \boldsymbol{\Theta}^{(i)}$ is the weight matrix going from the $\displaystyle i$th layer to the $\displaystyle (i+1)$th layer (can also be represented as $\displaystyle \boldsymbol{W}$) * The above example can be thought of as $\displaystyle \boldsymbol{a}^{(2)}=g(\Theta^{(2)}a^{(1)})$