Misc

  • , circular

Validation

    • number of chunks
    • model fitted to the training set of all chunks that aren’t , the validation set

Decision Trees

K-Nearest Neighbors

    • Gives the index in of the th nearest neighbor of , or the data point we’re interested in classifying
    • is an array of indices for each data sample point
    • We use the square of the L2 norm here to calculate distance but other measures could work
    • Gives the set of indices of the nearest neighbors to
    • Gives the number of “votes” for a particular class/label
    • We iterate over each of the -nearest neighbors and increment by one every time the neighbor equals the class of interest
    • The prediction is the class that maximizes the number of votes

Steps To Classify

  1. Initialize
    1. Place the data point in a feature space
  2. Calculate Distance
    2. Find the Euclidean distance from the data point with all other labeled data
    3. May have to normalize data point values along different coordinates to be Z-scores for each coordinate
  3. Sort Distance
    1. Sort distances of the data point with other labeled data points in increasing order
    2. For classification, the most common class of -nearest neighbors determines the data point class
    3. For regression, use the average of the -nearest neighbors’ labels

Perceptron

  • $$\displaystyle \text{Margin}(\mathcal{D},w)=\begin{cases}

\text{min}{(x,y)\in \mathcal{D}} \frac{1}{\left\lVert w\right\rVert}y{n}w^{T}x_{n} & \text{for separating hyperplane }w \
-\infty & \text{else}
\end{cases}$$

Mistake Bound Theorem

  • For a dataset with and labels
  • Suppose
    • can be thought of as a of a candidate for the margin of the dataset
  • Then the PerceptronTrainingAlgorithm will make mistakes on the training sequence
  • Proof Preliminaries:
  • Proof (1/3): After mistakes, by induction from and using
  • Proof (2/3): After mistakes,
  • Proof (3/3):

Logistic Regression

Convexity

  • is convex
  • is convex
  • is convex

Linear Regression

    • The loss function for simple linear regression
    • Minimizing this by varying / or gives the linear model that best fits with the data
    • is the target vector

Regularized

Run Times

AlgorithmTraining Time ComplexityTest Time Complexity
Decision Tree to (depends on splitting strategy) (balanced) or (unbalanced)
K-Nearest Neighbors (KNN) (no training, just storing data) (linear search) or (KD-tree, low dimensions)
Perceptron per epoch
Logistic Regression per iteration (gradient-based)
Linear Regression (Analytical - Normal Equation) (matrix inversion)
Linear Regression (Gradient Descent) (depends on iterations )
Full Batch Gradient Descent
Stochastic Gradient Descent (SGD)
Mini-Batch Gradient Descent (where is batch size)
  • = number of training samples
  • = number of features (dimensions)
  • = number of iterations in gradient-based methods
  • = batch size for mini-batch gradient descent

Algorithms