A boosting algorithm that minimizes 0-1 Loss

$ϵ_{t} = n \sum w_{t} (n) 1 (y_{n} \neq = h_{t} (x_{n}))$

Error (0-1 loss specifically) for the weight given to $h_{t} (\cdot)$ , the weak (i.e. base) classifier for this epoch $t$ in the training cycle
$n$ corresponds to a particular training example
$w_{t} (n)$ is the weight for this epoch $t$ for a particular sample $n$
- Set to $\frac{1}{n}$ for $t = 1$

$β_{t} = \frac{1}{2} ln (\frac{1 - ϵ _{t}}{ϵ _{t}})$

Contribution of $h_{t}$ to final classifier
$- \infty$ for large $ϵ_{t}$
- However, negative values for $β_{t}$ are always made positive by flipping the value of the weak classifier
$0$ for $ϵ_{t} = \frac{1}{2}$ , or when guessing randomly
$1$ for $ϵ_{t} = \frac{1}{3}$
$\infty$ for 0 $ϵ_{t}$

$w_{t + 1} (n) \propto w_{t} (n) e^{- β_{t} y_{n} h_{t} (x_{n})}$

Updated weights. We compute this step $T$ times, or the number of epochs we’d like until we’ve minimized error

$h [x] = sign [t = 1 \sum T β_{t} h_{t} (x)]$

Final classifier

Exponential Loss Minimization

$a_{t} (x) = a_{t - 1} (x) + β_{t} h_{t} (x)$

The Adaboost classifier for the current epoch equals the previous Adaboost classifier plus the current weighted base classifier
Adaboost minimizes empirical risk of exponential loss

$w_{t} (n) = e^{- y_{n} a_{t - 1} (x)}$

The weight of the exponential loss of a particular weak classifier is the exponential loss of our previous Adaboost classifier
$a_{t - 1} (x)$ is the Adaboost classifier for epoch $t - 1$

Training

AdaboostTrain():
- For each of $d$ features, sort data points for that feature value
  - This is the pre-sorting step and contributes $O (d N lo g N)$ to the runtime
- for $t$ in $T$ :
  - $h_{t} (x) :=$ decision stump threshold that minimizes error
    - Can be found in $O (d N)$ time by going for each of $d$ features, starting with the smallest threshold, calculating error, moving the threshold to the next datapoint as determined by our pre-sorting, updating the error in $O (1)$ time, and so on until we find the threshold that had the smallest error
  - $ϵ_{t} := n \sum w_{t} (n) 1 (y_{n} \neq = h_{t} (x_{n}))$
  - $β_{t} := \frac{1}{2} ln (\frac{1 - ϵ _{t}}{ϵ _{t}})$
  - $w_{t + 1} (n) := w_{t} (n) e^{- β_{t} y_{n} h_{t} (x_{n})}$
$h [x] = sign [t = 1 \sum T β_{t} h_{t} (x)]$

$O (d N lo g N + d NT)$

Training runtime using decision stumps as the base classifier

Knowledge

Explorer

Adaboost Algorithm

$ϵ_{t} = n \sum w_{t} (n) 1 (y_{n} \neq = h_{t} (x_{n}))$

$β_{t} = \frac{1}{2} ln (\frac{1 - ϵ _{t}}{ϵ _{t}})$

$w_{t + 1} (n) \propto w_{t} (n) e^{- β_{t} y_{n} h_{t} (x_{n})}$

$h [x] = sign [t = 1 \sum T β_{t} h_{t} (x)]$

Exponential Loss Minimization

$a_{t} (x) = a_{t - 1} (x) + β_{t} h_{t} (x)$

$w_{t} (n) = e^{- y_{n} a_{t - 1} (x)}$

Training

$O (d N lo g N + d NT)$

Graph View

Table of Contents

Backlinks

Knowledge

Explorer

Adaboost Algorithm

ϵt​=n∑​wt​(n)1(yn​=ht​(xn​))

βt​=21​ln(ϵt​1−ϵt​​)

wt+1​(n)∝wt​(n)e−βt​yn​ht​(xn​)

h[x]=sign[t=1∑T​βt​ht​(x)]

Exponential Loss Minimization

at​(x)=at−1​(x)+βt​ht​(x)

wt​(n)=e−yn​at−1​(x)

Training

O(dNlogN+dNT)

Graph View

Table of Contents

Backlinks

$ϵ_{t} = n \sum w_{t} (n) 1 (y_{n} \neq = h_{t} (x_{n}))$

$β_{t} = \frac{1}{2} ln (\frac{1 - ϵ _{t}}{ϵ _{t}})$

$w_{t + 1} (n) \propto w_{t} (n) e^{- β_{t} y_{n} h_{t} (x_{n})}$

$h [x] = sign [t = 1 \sum T β_{t} h_{t} (x)]$

$a_{t} (x) = a_{t - 1} (x) + β_{t} h_{t} (x)$

$w_{t} (n) = e^{- y_{n} a_{t - 1} (x)}$

$O (d N lo g N + d NT)$