Topics
H={h∣h:X→Y,h(x)=sign(a)}
- The set of hypotheses h(x) we hope to be y^, or the function that can predict the labels of a data point
- a is the activation function
Margin
Margin(D,w)={min(x,y)∈D∥w∥ynwTxn−∞for separating hyperplane welse
- The margin of a hyperplane (when varying w) is the minimum distance between the given hyperplane and a sample
- If there’s no hyperplane, it’s −∞
Margin(D)=maxwMargin(D,w)
- The margin of a dataset is the margin of the hyperplane with the greatest margin
Training
- PerceptronTrain(data = N samples/instances = {(x1,y1),…,(xn,yn)}, maxIter)
- for i = 1 through to maxIter
- Potentially shuffle data here
- for (x, y) in data
- a:=w⊤x
- if ay≤0 (i.e. the data is misclassified) then
- return w
- AveragedPerceptronTrain(data = N samples/instances = {(x1,y1),…,(xn,yn)}, maxIter)
- for i = 1 through to maxIter11
- Potentially shuffle here
- for (x, y) in data
- a:=w⊤x
- if ay≤0 (i.e. the data is misclassified) then
- w=w+yx
- This effectively moves the boundary plane defined by w toward classifying more correct examples
- μ=μ+w
- return μ
O(dn)