Perceptron

#Computers

Topics

$\displaystyle H=\left{ h|h:\mathbb{X}\rightarrow \mathbb{Y},h(\mathbf{x})=\text{sign}(a) \right}$

The set of hypotheses $\displaystyle h(x)$ we hope to be $\displaystyle \hat{y}$, or the function that can predict the labels of a data point
$\displaystyle a$ is the activation function

$$\displaystyle \text{Margin}(\mathcal{D},w)=\begin{cases}

\text{min}{(x,y)\in \mathcal{D}} \frac{1}{\left\lVert w\right\rVert}y{n}w^{T}x_{n} & \text{for separating hyperplane }w \\-\infty & \text{else}
\end{cases}$$

The margin of a hyperplane (when varying $\displaystyle w$) is the minimum distance between the given hyperplane and a sample
If there's no hyperplane, it's $\displaystyle -\infty$

$\displaystyle \text{Margin}(\mathcal{D})=\text{max}_{w}\text{Margin}(\mathcal{D},w)$

The margin of a dataset is the margin of the hyperplane with the greatest margin

Training

PerceptronTrain(data = N samples/instances = $\displaystyle \left{ (x_{1},y_{1}),\ldots ,(x_{n},y_{n}) \right}$, maxIter)
- for i = 1 through to maxIter
  - Potentially shuffle data here
  - for (x, y) in data
    - $\displaystyle a:=w^{^{\top}}x$
    - if $\displaystyle ay\leq 0$ (i.e. the data is misclassified) then
      - $\displaystyle w=w+yx$
- return $\displaystyle w$
AveragedPerceptronTrain(data = N samples/instances = $\displaystyle \left{ (x_{1},y_{1}),\ldots ,(x_{n},y_{n}) \right}$, maxIter)
- for i = 1 through to maxIter11
  - Potentially shuffle here
  - for (x, y) in data
    - $\displaystyle a:=w^{^{\top}}x$
    - if $\displaystyle ay\leq 0$ (i.e. the data is misclassified) then
      - $\displaystyle w=w+yx$
        This effectively moves the boundary plane defined by $\displaystyle w$ toward classifying more correct examples
    - $\displaystyle \mu=\mu+w$
- return $\displaystyle \mu$

Perceptron

Topics

$\displaystyle H=\left{ h|h:\mathbb{X}\rightarrow \mathbb{Y},h(\mathbf{x})=\text{sign}(a) \right}$

$$\displaystyle \text{Margin}(\mathcal{D},w)=\begin{cases}

$\displaystyle \text{Margin}(\mathcal{D})=\text{max}_{w}\text{Margin}(\mathcal{D},w)$

Training

$\displaystyle O(dn)$