Perceptron

#Computers

Topics

$\displaystyle H=\left{ h|h:\mathbb{X}\rightarrow \mathbb{Y},h(\mathbf{x})=\text{sign}(a) \right}$

  • The set of hypotheses $\displaystyle h(x)$ we hope to be $\displaystyle \hat{y}$, or the function that can predict the labels of a data point
  • $\displaystyle a$ is the activation function

$$\displaystyle \text{Margin}(\mathcal{D},w)=\begin{cases}

\text{min}{(x,y)\in \mathcal{D}} \frac{1}{\left\lVert w\right\rVert}y{n}w^{T}x_{n} & \text{for separating hyperplane }w \\-\infty & \text{else}
\end{cases}$$

  • The margin of a hyperplane (when varying $\displaystyle w$) is the minimum distance between the given hyperplane and a sample
  • If there's no hyperplane, it's $\displaystyle -\infty$

$\displaystyle \text{Margin}(\mathcal{D})=\text{max}_{w}\text{Margin}(\mathcal{D},w)$

  • The margin of a dataset is the margin of the hyperplane with the greatest margin

Training

  • PerceptronTrain(data = N samples/instances = $\displaystyle \left{ (x_{1},y_{1}),\ldots ,(x_{n},y_{n}) \right}$, maxIter)
    • for i = 1 through to maxIter
      • Potentially shuffle data here
      • for (x, y) in data
        • $\displaystyle a:=w^{^{\top}}x$
        • if $\displaystyle ay\leq 0$ (i.e. the data is misclassified) then
          • $\displaystyle w=w+yx$
    • return $\displaystyle w$
  • AveragedPerceptronTrain(data = N samples/instances = $\displaystyle \left{ (x_{1},y_{1}),\ldots ,(x_{n},y_{n}) \right}$, maxIter)
    • for i = 1 through to maxIter11
      • Potentially shuffle here
      • for (x, y) in data
        • $\displaystyle a:=w^{^{\top}}x$
        • if $\displaystyle ay\leq 0$ (i.e. the data is misclassified) then
          • $\displaystyle w=w+yx$
            • This effectively moves the boundary plane defined by $\displaystyle w$ toward classifying more correct examples
        • $\displaystyle \mu=\mu+w$
    • return $\displaystyle \mu$

$\displaystyle O(dn)$