Misc
- , circular
Validation
-
- number of chunks
- model fitted to the training set of all chunks that aren’t , the validation set
Decision Trees
K-Nearest Neighbors
-
- Gives the index in of the th nearest neighbor of , or the data point we’re interested in classifying
- is an array of indices for each data sample point
- We use the square of the L2 norm here to calculate distance but other measures could work
-
- Gives the set of indices of the nearest neighbors to
-
- Gives the number of “votes” for a particular class/label
- We iterate over each of the -nearest neighbors and increment by one every time the neighbor equals the class of interest
-
- The prediction is the class that maximizes the number of votes
Steps To Classify
- Initialize
- Place the data point in a feature space
- Calculate Distance
2. Find the Euclidean distance from the data point with all other labeled data
3. May have to normalize data point values along different coordinates to be Z-scores for each coordinate - Sort Distance
- Sort distances of the data point with other labeled data points in increasing order
- For classification, the most common class of -nearest neighbors determines the data point class
- For regression, use the average of the -nearest neighbors’ labels
Perceptron
-
$$\displaystyle \text{Margin}(\mathcal{D},w)=\begin{cases}
\text{min}{(x,y)\in \mathcal{D}} \frac{1}{\left\lVert w\right\rVert}y{n}w^{T}x_{n} & \text{for separating hyperplane }w \
-\infty & \text{else}
\end{cases}$$
Mistake Bound Theorem
- For a dataset with and labels
- Suppose
- can be thought of as a of a candidate for the margin of the dataset
- Then the PerceptronTrainingAlgorithm will make mistakes on the training sequence
- Proof Preliminaries:
- Proof (1/3): After mistakes, by induction from and using
- Proof (2/3): After mistakes,
- Proof (3/3):
Logistic Regression
Convexity
- is convex
- is convex
- is convex
Linear Regression
-
- The loss function for simple linear regression
- Minimizing this by varying / or gives the linear model that best fits with the data
- is the target vector
Regularized
Run Times
| Algorithm | Training Time Complexity | Test Time Complexity |
|---|---|---|
| Decision Tree | to (depends on splitting strategy) | (balanced) or (unbalanced) |
| K-Nearest Neighbors (KNN) | (no training, just storing data) | (linear search) or (KD-tree, low dimensions) |
| Perceptron | per epoch | |
| Logistic Regression | per iteration (gradient-based) | |
| Linear Regression (Analytical - Normal Equation) | (matrix inversion) | |
| Linear Regression (Gradient Descent) | (depends on iterations ) | |
| Full Batch Gradient Descent | ||
| Stochastic Gradient Descent (SGD) | ||
| Mini-Batch Gradient Descent | (where is batch size) |
- = number of training samples
- = number of features (dimensions)
- = number of iterations in gradient-based methods
- = batch size for mini-batch gradient descent
Algorithms







