Perceptron
#Computers
Topics
$\displaystyle H=\left{ h|h:\mathbb{X}\rightarrow \mathbb{Y},h(\mathbf{x})=\text{sign}(a) \right}$
- The set of hypotheses $\displaystyle h(x)$ we hope to be $\displaystyle \hat{y}$, or the function that can predict the labels of a data point
- $\displaystyle a$ is the activation function
$$\displaystyle \text{Margin}(\mathcal{D},w)=\begin{cases}
\text{min}{(x,y)\in \mathcal{D}} \frac{1}{\left\lVert w\right\rVert}y{n}w^{T}x_{n} & \text{for separating hyperplane }w \\-\infty & \text{else}
\end{cases}$$
- The margin of a hyperplane (when varying $\displaystyle w$) is the minimum distance between the given hyperplane and a sample
- If there's no hyperplane, it's $\displaystyle -\infty$
$\displaystyle \text{Margin}(\mathcal{D})=\text{max}_{w}\text{Margin}(\mathcal{D},w)$
- The margin of a dataset is the margin of the hyperplane with the greatest margin
Training
- PerceptronTrain(data = N samples/instances = $\displaystyle \left{ (x_{1},y_{1}),\ldots ,(x_{n},y_{n}) \right}$, maxIter)
- for i = 1 through to maxIter
- Potentially shuffle data here
- for (x, y) in data
- $\displaystyle a:=w^{^{\top}}x$
- if $\displaystyle ay\leq 0$ (i.e. the data is misclassified) then
- $\displaystyle w=w+yx$
- return $\displaystyle w$
- for i = 1 through to maxIter
- AveragedPerceptronTrain(data = N samples/instances = $\displaystyle \left{ (x_{1},y_{1}),\ldots ,(x_{n},y_{n}) \right}$, maxIter)
- for i = 1 through to maxIter11
- Potentially shuffle here
- for (x, y) in data
- $\displaystyle a:=w^{^{\top}}x$
- if $\displaystyle ay\leq 0$ (i.e. the data is misclassified) then
- $\displaystyle w=w+yx$
- This effectively moves the boundary plane defined by $\displaystyle w$ toward classifying more correct examples
- $\displaystyle w=w+yx$
- $\displaystyle \mu=\mu+w$
- return $\displaystyle \mu$
- for i = 1 through to maxIter11